Distributed Pagerank for P2P Systems
نویسندگان
چکیده
This paper defines and describes a fully distributed implementation of Google’s highly effective Pagerank algorithm, for “peer to peer”(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an inplace network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for an iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of under 15 nodes even for large networks and very accurate solutions. This implementation of Pagerank provides a uniform ranking scheme for documents in P2P systems, and its integration with P2P keyword search provides one solution to the network traffic problems engendered by return of document hits. In basic P2P keyword search, all the document hits must be returned to the querying node causing large network traffic. An incremental keyword search algorithm for P2P keyword search where document hits are sorted by pagerank, and incrementally returned to the querying node is proposed and evaluated. Integration of this algorithm into P2P keyword search can produce dramatic benefit both in terms of effectiveness for users and decrease in network traffic. The incremental search algorithm provided approximately a ten-fold reduction in network traffic for two-word and three-word queries.
منابع مشابه
SubgraphRank: PageRank Approximation for a Subgraph or in a Decentralized System
PageRank, a ranking metric for hypertext web pages, has received increased interests. As the Web has grown in size, computing PageRank scores on the whole web using centralized approaches faces challenges in scalability. Distributed systems like peer-to-peer(P2P) networks are employed to speed up PageRank. In a P2P system, each peer crawls web fragments independently. Hence the web fragment on ...
متن کاملKnowing Where to Search: Personalized Search Strategies for Peers in P2P Networks
Optimizing and focusing search and results ranking in P2P networks becomes more and more important with the increasing size of these networks. Even though a few approaches have already started to investigate the computation of PageRank-like values in P2P environments, none so far has investigated how personalization could be added to it. This paper tackles the problem of distributedly computing...
متن کاملP2P Network Trust Management Survey
Peer-to-peer applications (P2P) are no longer limited to home users, and start being accepted in academic and corporate environments. While file sharing and instant messaging applications are the most traditional examples, they are no longer the only ones benefiting from the potential advantages of P2P networks. For example, network file storage, data transmission, distributed computing, and co...
متن کاملEfficiently Handling Dynamics in Distributed Link Based Authority Analysis
Link based authority analysis is an important tool for ranking resources in social networks and other graphs. Previous work have presented JP , a decentralized algorithm for computing PageRank scores. The algorithm is designed to work in distributed systems, such as peer-to-peer (P2P) networks. However, the dynamics of the P2P networks, one if its main characteristics, is currently not handled ...
متن کاملTowards a Decentralized Search Architecture for the Web and P2P Systems
Search engines are among the most important applications or services on the web. Most existing successful search engines use a centralized architecture and global ranking algorithms to generate the ranking of documents crawled in their databases, for example, Google's PageRank. However, global ranking of documents has two potential problems: high computation cost, and potentially poor rankings....
متن کامل